Alignment of BLAST High-scoring Segment Pairs Based on the Longest Increasing Subsequence Algorithm

نویسنده

  • Hongyu Zhang
چکیده

MOTIVATION The popular BLAST algorithm is based on a local similarity search strategy, so its high-scoring segment pairs (HSPs) do not have global alignment information. When scientists use BLAST to search for a target protein or DNA sequence in a huge database like the human genome map, the existence of repeated fragments, homologues or pseudogenes in the genome often makes the BLAST result filled with redundant HSPs. Therefore, we need a computational strategy to alleviate this problem. RESULTS In the gene discovery group of Celera Genomics, I developed a two-step method, i.e. a BLAST step plus an LIS step, to align thousands of cDNA and protein sequences into the human genome map. The LIS step is based on a mature computational algorithm, Longest Increasing Subsequence (LIS) algorithm. The idea is to use the LIS algorithm to find the longest series of consecutive HSPs in the BLAST output. Such a BLAST+LIS strategy can be used as an independent alignment tool or as a complementary tool for other alignment programs like Sim4 and GenWise. It can also work as a general purpose BLAST result processor in all sorts of BLAST searches. Two examples from Celera were shown in this paper.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Consequences of Faster Alignment of Sequences

The Local Alignment problem is a classical problem with applications in biology. Given two input strings and a scoring function on pairs of letters, one is asked to find the substrings of the two input strings that are most similar under the scoring function. The best algorithms for Local Alignment run in time that is roughly quadratic in the string length. It is a big open problem whether subs...

متن کامل

A Distribution Function Arising in Computational Biology

Karlin and Altschul in their statistical analysis for multiple highscoring segments in molecular sequences introduced a distribution function which gives the probability there are at least r distinct and consistently ordered segment pairs all with score at least x. For long sequences this distribution can be expressed in terms of the distribution of the length of the longest increasing subseque...

متن کامل

Journal of Clinical and Diagnostic Research

Various algorithms are in use in medical processes to improve the speed, sensitivity and accuracy of the computations and analyses involved in those experiments. The aim of this paper is to suggest three improvements, namely Multi Hit, Dropoff percentage and NCM-2 in the BLAST algorithm. BLAST (Basic Local Alignment Search Tool) is a popular tool used for determining the patterns in genomic seq...

متن کامل

BitPAl: a bit-parallel, general integer-scoring sequence alignment algorithm

MOTIVATION Mapping of high-throughput sequencing data and other bulk sequence comparison applications have motivated a search for high-efficiency sequence alignment algorithms. The bit-parallel approach represents individual cells in an alignment scoring matrix as bits in computer words and emulates the calculation of scores by a series of logic operations composed of AND, OR, XOR, complement, ...

متن کامل

Whole genome alignments using MPI-LAGAN

Advances in sequencing technologies have substantially increased the number of fully sequenced genomes. Alignment algorithms play a crucial rule in analyzing whole genomes, identifying similar and conserved regions between pairs of genomes, leading to annotation of genomes with site-specific properties and functions. In this work we introduce a parallel algorithm for a widely used whole genome ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 19 11  شماره 

صفحات  -

تاریخ انتشار 2003